Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix docker image to actually work #742

Merged
merged 5 commits into from
Oct 3, 2024
Merged

Conversation

bloodearnest
Copy link
Member

@bloodearnest bloodearnest commented Sep 27, 2024

The job-runner docker image is in a half working state. A couple of attempts have previously been made to figure out how to get it to be able to use the host's docker and file system to run jobs, but they were messy and also didn't fully work.

However, in docker 4.26.0, released Dec '23, they added support for --group-add, which made the permissions much simpler.

Now we are on 22.04, we have a more up to date docker, and docker-compose v2, so this is simpler change. Using this new features, this change cleans up some of the old approaches.
We use a simple trick to make bind mounts work - the bind mounts must use the same path inside the container as on the host. This avoids us needing to add new configuration options and logic, and seems a reasonable constraint.

  • Switch to using --mount instead of --volume
  • Remove docker-over-ssh experiment support
  • Move to explicit 22.04 base image
  • Fix docker image permissions
  • Fix docker configuration
  • Add functional test to github workflows

@bloodearnest bloodearnest force-pushed the over-the-docker-line branch 8 times, most recently from 6b8fc68 to 61c9e35 Compare September 30, 2024 13:41
@bloodearnest bloodearnest marked this pull request as ready for review September 30, 2024 14:17
The --mount option is more strict that the --volume option, and will
error if the host source directory does not exist, whereas --volume will
silently create a directory.  We want the strict behaviour, so this
changes to use --mount.

The motivation for this change is to support job-runner running in
docker, which requires that the paths it uses are correct from the
host's perspective.  Currently, a incorrect path will be sort of work,
as the directory would be created inside the container, but only later
would the issue arise.

https://docs.docker.com/engine/storage/bind-mounts/#differences-between--v-and---mount-behavior
Was slow, complex, and unused
This simplifies the docker container by removing all the previous
user/group management inside the container as it is not needed. Instead,
we used the much simpler solution of adding an additional group to the
running user that is the host's docker group id. This allows the
container to have permission to access the host's bind-mounted
/var/run/docker.sock docker, but the container itself can run as the
local running user, which DTRT with respect to file permissions on bind
mounts.
The job-runner container uses the host's docker service. This means that
when it bind mounts directories with `docker run`, the source path must
exist on the *hosts* file system.

We've fixed this already for tests, with PYTEST_HOST_TMP, but we had not
actually fixed it for actual using the docker image for real. Before the
previous change to use --mount rather than --volume, a job would
succeed, but the files would be in the wrong location. Now we have that
change, the job fails.

We use a simple if inelegant fix for this: we mount the STORAGE_BASE
directories inside the container at the same path as on the host. This
means that the same config value can be used for job-runner copying
files itself, or for passing the host's docker to mount.

As part of this, I changedthe way we supply these values to being in
one time generated docker/.env file. This is more discoverable, usable
outside of just, and more efficant than before. I chose to use
`docker/medium` and `docker/high` as the volume mount locations, for
simplicity, and also, so the the docker machinery is isolated from the
non-docker venv set up.

This include a basic `just docker/functional-test` command to spin up
a job-runner container and run an actual job on it.
Copy link
Contributor

@evansd evansd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Always nice to see things getting simpler and more powerful at the same time!

Was a nice, well-explained series of commits as well. Could maybe do with editing the auto-generated PR title and description just to give more context for these changes.

jobrunner/executors/local.py Show resolved Hide resolved
docker/docker-compose.yaml Show resolved Hide resolved
docker/docker-compose.yaml Show resolved Hide resolved
@bloodearnest bloodearnest changed the title over the docker line Fix docker image to actually work Oct 3, 2024
@bloodearnest
Copy link
Member Author

Always nice to see things getting simpler and more powerful at the same time!

Was a nice, well-explained series of commits as well. Could maybe do with editing the auto-generated PR title and description just to give more context for these changes.

Arg, yes, sorry, have updated

@bloodearnest bloodearnest merged commit 4e543d9 into main Oct 3, 2024
20 checks passed
@bloodearnest bloodearnest deleted the over-the-docker-line branch October 3, 2024 10:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants